Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@NFL@darktundra.xyz
2024-04-01 22:40:44

Chiefs WR Rashee Rice cooperating with authorities following reported crash, lawyer says nfl.com/news/chiefs-wr-rashee-

@arXiv_csMA_bot@mastoxiv.page
2024-03-06 06:50:54

A Multi-agent Reinforcement Learning Study of Evolution of Communication and Teaching under Libertarian and Utilitarian Governing Systems
Aslan S. Dizaji
arxiv.org/abs/2403.02369

@privacity@social.linux.pizza
2024-04-28 18:34:12

Corral Some Zippy Blue Flames Into 3D Printed Troughs
poliverso.org/display/0477a01e
Corral Some Zippy Blue Flames Into 3D Printed Troughs [Steve Mould] came across an interesting little phenomenon of blue flames zi…

@arXiv_physicsappph_bot@mastoxiv.page
2024-05-06 07:23:17

Reversible single-pulse laser-induced phase change of Sb$_2$S$_3$ thin films: multi-physics modeling and experimental demonstrations
Capucine Laprais, Cl\'ement Zrounba, Julien Bouvier, Nicholas Blanchard, Matthieu Bugnet, Yael Guti\'errez, Saul Vazquez-Miranda, Shirly Espinoza, Peter Thiesen, Romain Bourrellier, Aziz Benamrouche, Nicolas Baboux, Guillaume Saint-Girons, Lotfi Berguiga, S\'ebastien Cueff

@benb@osintua.eu
2024-05-02 07:53:10

Military intelligence: Russia flies attack drones over occupied Zaporizhzhia plant, video shows: benborges.xyz/2024/05/02/milit

@arXiv_mathAP_bot@mastoxiv.page
2024-05-06 07:21:49

Master equations with indefinite nonlinearities
Wenxiong Chen, Yahong Guo
arxiv.org/abs/2405.02091 arxiv.org/pdf/2405…

@arXiv_csIR_bot@mastoxiv.page
2024-05-03 06:50:16

Multi-intent-aware Session-based Recommendation
Minjin Choi, Hye-young Kim, Hyunsouk Cho, Jongwuk Lee
arxiv.org/abs/2405.00986

@arXiv_quantph_bot@mastoxiv.page
2024-05-02 08:42:52

This arxiv.org/abs/2403.11893 has been replaced.
initial toot: mastoxiv.page/@arXiv_qu…

@arXiv_csCL_bot@mastoxiv.page
2024-05-01 06:49:12

Better & Faster Large Language Models via Multi-token Prediction
Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozi\`ere, David Lopez-Paz, Gabriel Synnaeve
arxiv.org/abs/2404.19737 arxiv.org/pdf/2404.19737
arXiv:2404.19737v1 Announce Type: new
Abstract: Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results in higher sample efficiency. More specifically, at each position in the training corpus, we ask the model to predict the following n tokens using n independent output heads, operating on top of a shared model trunk. Considering multi-token prediction as an auxiliary training task, we measure improved downstream capabilities with no overhead in training time for both code and natural language models. The method is increasingly useful for larger model sizes, and keeps its appeal when training for multiple epochs. Gains are especially pronounced on generative benchmarks like coding, where our models consistently outperform strong baselines by several percentage points. Our 13B parameter models solves 12 % more problems on HumanEval and 17 % more on MBPP than comparable next-token models. Experiments on small algorithmic tasks demonstrate that multi-token prediction is favorable for the development of induction heads and algorithmic reasoning capabilities. As an additional benefit, models trained with 4-token prediction are up to 3 times faster at inference, even with large batch sizes.

@arXiv_csIR_bot@mastoxiv.page
2024-03-27 06:50:17

Masked Multi-Domain Network: Multi-Type and Multi-Scenario Conversion Rate Prediction with a Single Model
Wentao Ouyang, Xiuwu Zhang, Chaofeng Guo, Shukui Ren, Yupei Sui, Kun Zhang, Jinmei Luo, Yunfeng Chen, Dongbo Xu, Xiangzheng Liu, Yanlong Du
arxiv.org/abs/2403.17425